home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Monster Media 1996 #15
/
Monster Media Number 15 (Monster Media)(July 1996).ISO
/
prog_c
/
recio214.zip
/
TUTOR.TXT
< prev
next >
Wrap
Text File
|
1996-06-14
|
17KB
|
458 lines
Title: A TUTORIAL INTRODUCTION TO THE C LANGUAGE RECIO LIBRARY
Copyright: (C) 1994-1996, William Pierpoint
Version: 2.14
Date: June 14, 1996
1.0 STDIO AND RECIO
The program many people learned when first introduced to the C programming
language was the "hello, world" program published in Kernighan and Richie's
"The C Programming Language." And the first line of that first program,
#include <stdio.h>
tells the compiler that the functions and macros provided by the standard
input/output library are needed for the program. The "hello, world" program
uses the powerful printf statement for output. The counterpart for input,
scanf, looks deceptively like printf, but unfortunately has many ways to
trap an unwary programmer. These include failure to provide the address
of an variable, size of argument mismatched with the specification in the
format statement, and number of arguments mismatched with the specification
in the format statement.
Suppose you use a library that defines a boolean type as an unsigned
character. You develop an output module that writes variables of type
boolean to a file,
/* output */
boolean state=0;
...
fprintf(fp, "%6d", state);
where fp is a pointer to FILE. Once you get the output module working, you
decide to develop the input module to read back into the program the data
you wrote to disk.
/* input */
boolean state;
...
fscanf(fp, "%d", &state);
So, is this ok? On one compiler this worked consistently without problems,
but on another compiler, it overwrote the value in another variable. Why?
Because fscanf expects the address of an integer, not an unsigned char.
One compiler overwrote the adjoining memory address and the other compiler
apparently did not. And since compilers don't do type checking on functions
with variable number of arguments, you don't get any errors or warnings. That
is what is so infuriating about this type of error. You see that another
variable has the wrong value, you check all the code that uses the other
variable, and you can't find anything wrong with it. In the midst of
development, it is hard to imagine that the problem is caused by code that
has nothing to do with the variable containing the bad value.
The recio (record input/output) library takes a different approach to input.
To input the boolean variable using the recio library, just write
/* input */
boolean state;
...
state = rgeti(rp);
where rp is a pointer to REC (the recio structure analogous to the stdio
FILE structure). The rgeti function gets an integer from the input and the
compiler converts it to a boolean when it makes the assignment. No need to
worry about crazy pointers here!
Since virtually every program has to do input or output, the stdio library
is very familiar to C programmers. Many functions in the recio library
are analogous to the stdio library. This makes the learning curve easier.
Analogous stdio/recio components
stdio recio
--------- ---------
FILE REC
FOPEN_MAX ROPEN_MAX
stdin recin
stdout recout
stderr recerr
stdprn recprn
fopen ropen
fclose rclose
fgets rgetrec
fscanf rgeti, rgetd, rgets, ...
fprintf rputi, rputd, rputs, ...
clearerr rclearerr
fgetpos rgetfldpos
fsetpos rsetfldpos
feof reof
ferror rerror
2.0 EXAMPLES
2.1 Line Input
One of the first things you can do with the recio library to is to substitute
rgetrec() for fgets() to get a line of text (record) from a file (or standard
input). The advantage of rgetrec() is that you don't have to go to the
trouble to allocate space for a string buffer, or worry about the size of the
string buffer. The recio library handles that for you automatically. The
rgetrec function is like fgets() in that it gets a string from a stream, but
it is like gets() in that it trims off the trailing newline character.
The echo program demonstrates the use of the rgetrec function.
/* echo.c - echo input to output */
#include <stdio.h>
#include <stdlib.h>
#include "recio.h"
main()
{
/* while input continues to be available */
while (rgetrec(recin)) {
/* echo record buffer to output */
puts(rrecs(recin));
}
/* if exited loop before end-of-file */
if (!reof(recin)) {
exit (EXIT_FAILURE);
}
return (EXIT_SUCCESS);
}
The echo program reads standard input using recin, the recio equivalent to
stdin. For output the recio library provides recout, recerr, and recprn.
The rgetrec function returns a pointer to the record buffer, but the echo
program did not use a variable to hold a pointer to the string (although
it could have). Instead, the record buffer was accessed through the rrecs
macro, which provides a pointer to the record buffer.
Since rgetrec returns NULL on either error or end-of-file, your program
needs to find out which condition occurred. You can use either the reof
function or the rerror function to determine this. The echo program uses
the reof function; the wc program in section 2.2 uses the rerror function.
The echo program just exits with a failure status if an error occurred
before the end of the file was reached.
2.2 Line, Word, and Character Counting
The power of the recio library comes from its facilities to break records
into fields and from the many functions that operate on fields. Because
the default field delimiter is the space character (which breaks on any
whitespace), the default behavior is equivalent to subdividing a line of
text into words.
The wc program counts lines, words, and characters for files specified
on the command line.
/* wc.c - count lines, words, characters */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "recio.h"
main(int argc, char *argv[])
{
int nf; /* number of files */
REC *rp; /* pointer to open record stream */
long nc, /* number of characters (not including line terminator) */
nw, /* number of words */
nl; /* number of lines */
/* loop through all files */
for (nf=1; nf < argc; nf++) {
/* open record stream */
rp = ropen(argv[nf], "r");
if (!rp) {
if (errno == ENOENT) {
printf("ERROR: Could not open %s\n", argv[nf]);
continue;
} else {
printf("FATAL ERROR: %s\n", strerror(errno));
exit (EXIT_FAILURE);
}
}
/* initialize */
nc = nw = 0;
rsetfldch(rp, ' ');
rsettxtch(rp, ' ');
/* loop through all lines (records) */
while (rgetrec(rp)) {
/* count number of characters in line w/o '\n' */
nc += strlen(rrecs(rp));
/* count number of words (fields) */
nw += rnumfld(rp);
}
/* if exited loop on error rather than end-of-file */
if (rerror(rp)) {
printf("ERROR reading %s - %s\n",
rnames(rp), rerrstr(rp));
exit (EXIT_FAILURE);
}
/* get number of lines (records) */
nl = rrecno(rp);
/* output results */
printf("%s: %ld %ld %ld\n", rnames(rp), nl, nw, nc);
/* close record stream */
rclose(rp);
}
return (EXIT_SUCCESS);
}
If ropen() fails, the wc program goes to the trouble to check errno for
ENOENT rather than just assuming that the failure was caused by a missing
file.
The wc program also sets the field and text delimiters even though it is
unneccessary here since they are the same as the default values. If you
wanted to read a comma-delimited file, you could set the the delimiters to
rsetfldch(rp, ',');
rsettxtch(rp, '"');
which allows you to also read text fields containing commas by delimiting
the text with quotes, such as "Hello, World."
Fields are counted using the rnumfld function, which counts all the fields
in the current record. In reading a data file, you could use rnumfld()
to count the number of fields each time a record is read. This would give
you a quick check that the expected number of fields was found prior to
processing the record.
The recio library gives you more control over your input data than stdio.
If the last field is missing from a data file, fscanf() starts reading the
next line. In a file with a complex structure, it can be difficult to tell
where you are when something goes awry. Sometimes every record in a file
has a different format. The recio library has functions you can use to
always find out where you are. You only input the next record when you use
the rgetrec function.
The character count does not include any line termination characters. The
recio library strips these out of the record buffer.
2.3 Field Functions
For most programs you will want to use the recio functions that read or write
data. Functions are available to read and write integer, unsigned integer,
long, unsigned long, float, double, time (time_t and struct tm), character,
and string data types. There are two types of field functions: those that
deal with character delimited fields (such as comma-delimited) and those that
deal with column delimited fields (such as an integer between columns 1 and 5).
Each type is further divided in two: one for numeric data in base 10 and the
other for numeric data in any base from 2 to 36.
Class Description
------ -----------------------------------------------------
r character delimited fields; numeric data in base 10
rc column delimited fields; numeric data in base 10
rb character delimited fields; numeric data in any base
rcb column delimited fields; numeric data in any base
A mnemonic system makes it easy to construct the name of any function you
want. All you need to remember is that there are four prefixes (one for each
class), two bodies (get reads data; put writes data), ten suffixes (one
for each data type), and that the rb and rcb prefixes are used only with the
i, l, ui, and ul suffixes.
Prefix Body Suffix Prefix Body Suffix
------ ---- ------ ------ ---- ------
r get c rb get i
rc put d rcb put l
f ui
i ul
l
s
t
tm
ui
ul
Example: The rbgetui() function takes record pointer and base arguments,
and returns an unsigned integer.
Additional information on these functions is found in the text file SPEC.TXT.
2.4 Error Handling
Rather than checking errno and rerror() for errors after each call to a recio
function, or checking the return value from those functions that return an
error value, you can register a callback error function using the rseterrfn
function. The error function gives you one convenient place to handle all
recio errors. As you write your error function, you will find that the recio
library provides many useful functions for determining and reporting the
location and type of error.
The dif program reads through two files line by line looking for the first
difference between the two files. It uses a very simple callback error
function that just reports the error and then exits the program.
/* dif.c - locate line where two text files first differ */
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "recio.h"
/* simple callback error function */
void rerrfn(REC *rp)
{
if (risvalid(rp)) {
fprintf(stderr, "FATAL ERROR: %s - %s\n",
rnames(rp), rerrstr(rp));
} else {
fprintf(stderr, "FATAL ERROR: %s\n", strerror(errno));
}
exit(2);
}
/* file open failure error function */
void fopenerr(char *filename)
{
fprintf(stderr, "FATAL ERROR: %s - %s\n",
filename, strerror(errno));
exit(2);
}
int main(int argc, char *argv[])
{
int errorlevel=1; /* return errorlevel (1=files different; 0=same) */
REC *rp1; /* record pointer for first file */
REC *rp2; /* record pointer for second file */
if (argc != 3) {
fprintf(stderr, "Usage: dif file1 file2\n");
exit(2);
}
/* register callback error function */
rseterrfn(rerrfn);
/* open first record stream */
rp1 = ropen(argv[1], "r");
if (!rp1 && errno == ENOENT) fopenerr(argv[1]);
/* open second record stream */
rp2 = ropen(argv[2], "r");
if (!rp2 && errno == ENOENT) fopenerr(argv[2]);
/* read files line by line */
for (;;) {
rgetrec(rp1);
rgetrec(rp2);
/* if neither file has reached the end */
if (!reof(rp1) && !reof(rp2)) {
if (strcmp(rrecs(rp1), rrecs(rp2))) {
printf("Files first differ at line %ld\n\n", rrecno(rp1));
printf("%s:\n%s\n\n", rnames(rp1), rrecs(rp1));
printf("%s:\n%s\n\n", rnames(rp2), rrecs(rp2));
break;
}
/* if file 1 ended first */
} else if (reof(rp1) && !reof(rp2)) {
printf("File %s ends before file %s\n\n",
rnames(rp1), rnames(rp2));
break;
/* if file 2 ended first */
} else if (!reof(rp1) && reof(rp2)) {
printf("File %s ends before file %s\n\n",
rnames(rp2), rnames(rp1));
break;
/* else both files have reached the end simultaneously */
} else {
printf("Files %s and %s are identical\n",
rnames(rp1), rnames(rp2));
errorlevel = 0;
break;
}
}
rcloseall();
return errorlevel;
}
One important kind of error is the data error. Data errors occur when data
values are too large or too small, when fields contain illegal characters,
or when data is missing. Your error function can correct data errors either
through algorithms (such as the rfix functions used in the test programs)
or by asking the user for a replacement value.
For an example of a callback error function that handles data errors, see
the TESTCHG.C source code. A skeleton code structure for a callback error
function is given in the file DESIGN.TXT.
2.5 Warnings
Warnings are less serious than errors and for some programs you may well
decide that they do not need to be considered. The primary differences
between errors and warnings come into play if you decide to ignore them
by not registering callback error and warning functions. On error, the
recio library stores the error number and stops reading or writing the
record stream. On warning, the recio library continues to read or write
the record stream, and only stores the warning number until another
warning comes along to replace it.
If you check the error number just before closing a record stream, you get
the number of the first error encountered. On the other hand, if you check
the warning number at this point, you get the last warning. Warnings are
handled with a set of routines analogous to the error handling routines.
What kinds of warnings can you get? One warning lets you know if you have
read in an empty data string. Another warning occurs if you write to a
columnar field and the width between the columns is too small to hold the
data. You will find some examples of callback warning functions in the
source code for the test programs. A skeleton code structure for a callback
warning function is given in the file DESIGN.TXT.
Simple callback error and warning functions rerrmsg and rwarnmsg are now
included in the RECIO library. In the initial prototyping stages of
development, you may wish to use these functions rather than taking the
time to develop your own functions. Later you can substitute more robust
callback functions.
3.0 WHAT NOW?
That's it for this brief introduction. Next, if you haven't already done
so, spend a few minutes running the test programs and perusing the test
source code. Then to study the recio functions in more detail, move on to
the remaining documentation.
4.0 REFERENCES
Kernighan, B.W. and Ritchie, D.M. The C Programming Language, Second
Edition. Prentice Hall, Englewood Cliffs, NJ, 1988.